Decision Tree Simulator

Conditional entropy is a measure of the amount of impurity, uncertainty or randomness remaining in a random variable given that another random variable is known.

In the context of classification problems, conditional entropy quantifies the uncertainty of a target variable \( Y \), which describes the set of class labels, given a category/value \( x \) of the attribute \( X \).

  • Note: The term "category" (of an attribute) usually refers to categorical or nominal values, while "value" (of an attribute) usually refers to numerical values. This application only uses categorical values in practice but will use both terms for theoretical information.

Learn more

For a binary classification problem, conditional entropy \( E(Y|X) \) is calculated using the following formula:

\[ E(Y|X) = \sum_{x \in X} p(x) E(Y|X = x) \]

Where:

  • \( X \) is the set of categories/values of the first attribute
  • \( Y \) is the set of categories/values of the class attribute
  • \( p(x) \) is the proportion between the number of instances with the category/value \( x \) and the size of the dataset

Calculating the conditional entropies for the individual categories/values \( x \) of attribute \( X \) (depicted as \( E(Category) \) in the calculator below) is possible with the following formula:

\[ E(Y|X = x) = - \sum_{y \in Y} p(y|x) log_2(p(y|x)) \]

Where:

  • \( p(y|x) \) is the proportion of instances in the dataset with attribute category/value \( x \) that also have the class label \( y \)

It sums over all the possible categories/values of the class attribute \( Y \).

Conditional Entropy calculator

All instance values are 0. Please introduce at least one instance with a positive integer value.
At least one value is invalid. Please only use positive integers as input values.
At least one input field is empty. Please add positive integer values to each input field.
Example attribute
Class 1 Class 2 Ratio E(Category) CE(Attribute)
+
Calculate Conditional Entropy